High Performance FFT on SGI Altix 3700

نویسندگان

  • Akira Nukada
  • Daisuke Takahashi
  • Reiji Suda
  • Akira Nishida
چکیده

We have developed a high-performance FFT on SGI Altix 3700, improving the efficiency of the floating-point operations required to compute FFT by using a kind of loop fusion technique. As a result, we achieved a performance of 4.94 Gflops at 1-D FFT of length 4096 with an Itanium 2 1.3 GHz (95% of peak), and a performance of 28 Gflops at 2-D FFT of 4096 with 32 processors. Our FFT kernel outperformed the other existing libraries.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimizing OpenMP Parallelized DGEMM Calls on SGI Altix 3700

Using functions of parallelized mathematical libraries is a common way to accelerate numerical applications. Computer architectures with shared memory characteristics support different approaches for the implementation of such libraries, usually OpenMP or MPI. This paper’s content is based on the performance comparison of DGEMM calls (floating point matrix multiplication, double precision) with...

متن کامل

Interconnect Performance Evaluation of SGI Altix 3700 Cray X1, Cray Opteron, and Dell PowerEdge

We study the performance of inter-process communication on four high-speed multiprocessor systems using a set of communication benchmarks. The goal is to identify certain limiting factors and bottlenecks with the interconnect of these systems as well as to compare between these interconnects. We used several benchmarks to examine network behavior under different communication patterns and numbe...

متن کامل

Performance Characterization of Matrix Multiplication on SGI Altix 3700

Matrix multiplication is widely used in a variety of applications and is often one of the core components of many scientific computations which includes graph theory, numerical methods, digital control and signal processing. Multiplication of large matrices require a lot of computation time as its complexity is O(n), where n is the dimension of the matrix. A serial algorithm to compute large ma...

متن کامل

Analyzing Mutual Influences of High Performance Computing Programs on SGI Altix 3700 and 4700 Systems with PARbench

c © 2007 by John von Neumann Institute for Computing Permission to make digital or hard copies of portions of this work for personal or classroom use is granted provided that the copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise requires prior specific permission by the publisher ment...

متن کامل

On the performance of molecular dynamics applications on current high-end systems.

The effective exploitation of current high performance computing (HPC) platforms in molecular simulation relies on the ability of the present generation of parallel molecular dynamics code to make effective utilisation of these platforms and their components, including CPUs and memory. In this paper, we investigate the efficiency and scaling of a series of popular molecular dynamics codes on th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007